-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add status conditions for nodeClass #6438
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled.
|
Pull Request Test Coverage Report for Build 9749476087Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
Snapshot successfully published to
|
launchTemplateProvider launchtemplate.Provider | ||
} | ||
|
||
func (n Readiness) Reconcile(ctx context.Context, nodeClass *v1beta1.EC2NodeClass) (reconcile.Result, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just not move these checks to each reconcile function of each resource? Instead of having a controller for each field we change
@@ -0,0 +1,120 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to have separate controllers for each resource?
pkg/controllers/nodeclass/status/instance_profile/controller.go
Outdated
Show resolved
Hide resolved
a6d733a
to
bb128ee
Compare
Pull Request Test Coverage Report for Build 9800927619Details
💛 - Coveralls |
func (c *Controller) Reconcile(ctx context.Context, nodeClass *v1beta1.EC2NodeClass) (reconcile.Result, error) { | ||
ctx = injection.WithControllerName(ctx, "nodeclass.ami") | ||
|
||
if !controllerutil.ContainsFinalizer(nodeClass, v1beta1.TerminationFinalizer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we run into a potential race condition here with multiple controllers adding this termination finalizer? I.e. if multiple controllers make the read before any make the write, and multiple patches get applied.
if errs != nil { | ||
return reconcile.Result{}, errs | ||
} | ||
if !nodeClass.StatusConditions().IsTrue(v1beta1.ConditionTypeAMIsReady) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should be rereconciling immediately in this case. If this is caused by genuine user misconfiguration, we won't be able to resolve any additional AMIs unless the user updates their configuration (or does something like tag an existing AMI to match). If this is caused by an API outage, retrying immediately doesn't help us either. We're hitting internal caches so it shouldn't cause a retry storm, but there's still not a reason to retry immediately here IMO.
} | ||
stored := nodeClass.DeepCopy() | ||
amis, err := c.amiProvider.List(ctx, nodeClass) | ||
var errs error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This multierr is only used to capture the error from amiProvider.List
, can we just drop it?
} | ||
} | ||
if !equality.Semantic.DeepEqual(stored, nodeClass) { | ||
if err = c.kubeClient.Status().Update(ctx, nodeClass); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be an update over a patch? We're only updating the existing status condition right? Do we expect any other controllers to be updating this status condition which we could conflict with? Same comment for all other controllers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to follow what we have done with most of our controllers for status updates. Is there a benefit of using patch over update?
res, err := reconciler.Reconcile(ctx, nodeClass) | ||
errs = multierr.Append(errs, err) | ||
results = append(results, res) | ||
if nodeClass.Spec.Role != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: IMO it would be cleaner to pull this logic out into a helper function, and do away with the multierror. This is going to be a theme in my review, but we should avoid multierrors if they're not needed since it's use implies there will be multiple errors IMO.
func (c *Controller) Reconcile(...) {
...
instanceProfile, err := c.resolveInstanceProfile(nodeClass)
if err != nil {
nodeClass.StatusConditions().SetFalse(v1beta1.ConditionTypeInstanceProfileReady, "InstanceProfileCreateError", "Error creating instance profile")
} else {
nodeClass.Status.InstanceProfile = instanceProfile
nodeClass.StatusConditions().SetTrue(v1beta1.ConditionTypeInstanceProfileReady)
}
if !equality.Semantic.DeepEqual(stored, nodeClass) {
...
}
if err != nil {
return reconcile.Result{}, err
}
...
}
func (c *Controller) resolveInstanceProfile(nc *v1beta1.EC2NodeClass) (string, error) {
if nodeClass.Spec.Role == "" {
return lo.FromPtr(nodeClass.Spec.InstanceProfile), nil
}
return c.instanceProfileProvider.Create(ctx, nc)
}
@@ -12,62 +12,44 @@ See the License for the specific language governing permissions and | |||
limitations under the License. | |||
*/ | |||
|
|||
package status | |||
package instance_profile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package names shouldn't include underscores, same comment for launch_template
and security_group
.
stored := nodeClass.DeepCopy() | ||
|
||
securityGroups, err := c.securityGroupProvider.List(ctx, nodeClass) | ||
var errs error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also rework this to not use multierror, maybe by moving into a helper function. In general there's a lot of nesting here, moving it into a helper function with some early returns could help readability.
if errs != nil { | ||
return reconcile.Result{}, errs | ||
} | ||
if !nodeClass.StatusConditions().IsTrue(v1beta1.ConditionTypeSecurityGroupsReady) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like with AMIs I'm not sure we should be rereconciling immediately here. Is there a reason we would fail to resolve security groups once but succeed immediately after (without any changes to the NodeClass).
} | ||
stored := nodeClass.DeepCopy() | ||
subnets, err := c.subnetProvider.List(ctx, nodeClass) | ||
var errs error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment I've given elsewhere about multierr and reducing nesting with a helper function.
if errs != nil { | ||
return reconcile.Result{}, errs | ||
} | ||
if !nodeClass.StatusConditions().IsTrue(v1beta1.ConditionTypeSubnetsReady) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment I've given on other controllers about the immediate requeue.
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
Fixes #N/A
Description
This PR adds status conditions to EC2nodeClass.
How was this change tested?
/karpenter snapshot
Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.